Book Search Experiments: Investigating IR Methods for the Indexing and Retrieval of Books
Identifieur interne : 000C65 ( Main/Exploration ); précédent : 000C64; suivant : 000C66Book Search Experiments: Investigating IR Methods for the Indexing and Retrieval of Books
Auteurs : Hengzhi Wu [Royaume-Uni] ; Gabriella Kazai [Royaume-Uni] ; Michael Taylor [Royaume-Uni]Source :
- Lecture Notes in Computer Science [ 0302-9743 ] ; 2008.
Abstract
Abstract: Through mass-digitization projects and with the use of OCR technologies, digitized books are becoming available on the Web and in digital libraries. The unprecedented scale of these efforts, the unique characteristics of the digitized material as well as the unexplored possibilities of user interactions make full-text book search an exciting area of information retrieval (IR) research. Emerging research questions include: How appropriate and effective are traditional IR models when applied to books? What book specific features (e.g., back-of-book index) should receive special attention during the indexing and retrieval processes? How can we tackle scalability? In order to answer such questions, we developed an experimental platform to facilitate rapid prototyping of a book search system as well as to support large-scale tests. Using this system, we performed experiments on a collection of 10 000 books, evaluating the efficiency of a novel multi-field inverted index and the effectiveness of the BM25F retrieval model adapted to books, using book-specific fields.
Url:
DOI: 10.1007/978-3-540-78646-7_23
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream Istex, to step Corpus: 000255
- to stream Istex, to step Curation: 000251
- to stream Istex, to step Checkpoint: 000732
- to stream Main, to step Merge: 000C77
- to stream Main, to step Curation: 000C65
Le document en format XML
<record><TEI wicri:istexFullTextTei="biblStruct"><teiHeader><fileDesc><titleStmt><title xml:lang="en">Book Search Experiments: Investigating IR Methods for the Indexing and Retrieval of Books</title>
<author><name sortKey="Wu, Hengzhi" sort="Wu, Hengzhi" uniqKey="Wu H" first="Hengzhi" last="Wu">Hengzhi Wu</name>
</author>
<author><name sortKey="Kazai, Gabriella" sort="Kazai, Gabriella" uniqKey="Kazai G" first="Gabriella" last="Kazai">Gabriella Kazai</name>
</author>
<author><name sortKey="Taylor, Michael" sort="Taylor, Michael" uniqKey="Taylor M" first="Michael" last="Taylor">Michael Taylor</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:8C6184460C675464B92E71BB4961244B26710894</idno>
<date when="2008" year="2008">2008</date>
<idno type="doi">10.1007/978-3-540-78646-7_23</idno>
<idno type="url">https://api.istex.fr/document/8C6184460C675464B92E71BB4961244B26710894/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000255</idno>
<idno type="wicri:Area/Istex/Curation">000251</idno>
<idno type="wicri:Area/Istex/Checkpoint">000732</idno>
<idno type="wicri:doubleKey">0302-9743:2008:Wu H:book:search:experiments</idno>
<idno type="wicri:Area/Main/Merge">000C77</idno>
<idno type="wicri:Area/Main/Curation">000C65</idno>
<idno type="wicri:Area/Main/Exploration">000C65</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">Book Search Experiments: Investigating IR Methods for the Indexing and Retrieval of Books</title>
<author><name sortKey="Wu, Hengzhi" sort="Wu, Hengzhi" uniqKey="Wu H" first="Hengzhi" last="Wu">Hengzhi Wu</name>
<affiliation wicri:level="4"><country xml:lang="fr">Royaume-Uni</country>
<wicri:regionArea>Department of Computer Science, Queen Mary,University of London</wicri:regionArea>
<placeName><settlement type="city">Londres</settlement>
<region type="country">Angleterre</region>
<region type="région" nuts="1">Grand Londres</region>
</placeName>
<orgName type="university">Université de Londres</orgName>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Royaume-Uni</country>
</affiliation>
</author>
<author><name sortKey="Kazai, Gabriella" sort="Kazai, Gabriella" uniqKey="Kazai G" first="Gabriella" last="Kazai">Gabriella Kazai</name>
<affiliation wicri:level="1"><country xml:lang="fr">Royaume-Uni</country>
<wicri:regionArea>Microsoft Research, Cambridge</wicri:regionArea>
<wicri:noRegion>Cambridge</wicri:noRegion>
</affiliation>
<affiliation><wicri:noCountry code="no comma">E-mail: gabkaz@microsoft.com</wicri:noCountry>
</affiliation>
</author>
<author><name sortKey="Taylor, Michael" sort="Taylor, Michael" uniqKey="Taylor M" first="Michael" last="Taylor">Michael Taylor</name>
<affiliation wicri:level="1"><country xml:lang="fr">Royaume-Uni</country>
<wicri:regionArea>Microsoft Research, Cambridge</wicri:regionArea>
<wicri:noRegion>Cambridge</wicri:noRegion>
</affiliation>
<affiliation><wicri:noCountry code="no comma">E-mail: mitaylor@microsoft.com</wicri:noCountry>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="s">Lecture Notes in Computer Science</title>
<imprint><date>2008</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">8C6184460C675464B92E71BB4961244B26710894</idno>
<idno type="DOI">10.1007/978-3-540-78646-7_23</idno>
<idno type="ChapterID">23</idno>
<idno type="ChapterID">Chap23</idno>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass></textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Abstract: Through mass-digitization projects and with the use of OCR technologies, digitized books are becoming available on the Web and in digital libraries. The unprecedented scale of these efforts, the unique characteristics of the digitized material as well as the unexplored possibilities of user interactions make full-text book search an exciting area of information retrieval (IR) research. Emerging research questions include: How appropriate and effective are traditional IR models when applied to books? What book specific features (e.g., back-of-book index) should receive special attention during the indexing and retrieval processes? How can we tackle scalability? In order to answer such questions, we developed an experimental platform to facilitate rapid prototyping of a book search system as well as to support large-scale tests. Using this system, we performed experiments on a collection of 10 000 books, evaluating the efficiency of a novel multi-field inverted index and the effectiveness of the BM25F retrieval model adapted to books, using book-specific fields.</div>
</front>
</TEI>
<affiliations><list><country><li>Royaume-Uni</li>
</country>
<region><li>Angleterre</li>
<li>Grand Londres</li>
</region>
<settlement><li>Londres</li>
</settlement>
<orgName><li>Université de Londres</li>
</orgName>
</list>
<tree><country name="Royaume-Uni"><region name="Angleterre"><name sortKey="Wu, Hengzhi" sort="Wu, Hengzhi" uniqKey="Wu H" first="Hengzhi" last="Wu">Hengzhi Wu</name>
</region>
<name sortKey="Kazai, Gabriella" sort="Kazai, Gabriella" uniqKey="Kazai G" first="Gabriella" last="Kazai">Gabriella Kazai</name>
<name sortKey="Taylor, Michael" sort="Taylor, Michael" uniqKey="Taylor M" first="Michael" last="Taylor">Michael Taylor</name>
<name sortKey="Wu, Hengzhi" sort="Wu, Hengzhi" uniqKey="Wu H" first="Hengzhi" last="Wu">Hengzhi Wu</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000C65 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000C65 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= OcrV1 |flux= Main |étape= Exploration |type= RBID |clé= ISTEX:8C6184460C675464B92E71BB4961244B26710894 |texte= Book Search Experiments: Investigating IR Methods for the Indexing and Retrieval of Books }}
This area was generated with Dilib version V0.6.32. |